Basics of Programming:
conditional statements, loops,
custom functions

Enrico Toffalini

Conditional Programming

Conditional statements like if, if ... else, and elif are essential in Python to control the analysis pipeline and automating tasks and decisions. The logic closely resembles that in R, but as previously seen Python uses indentation (not curly or round brackets) to define blocks of code


if statement

Performs an action only if a condition is met:

age = 20

if age >= 18:
    print("Adult")
Adult

if statement

Basic flowchart showing the logic of the if statement

if...else statement

Sometimes you need to perform alternative, mutually-exclusive actions:

if...else statement

Sometimes you need to perform alternative, mutually-exclusive actions:

age = 15

if age >= 18:
    print("Adult")
else:
    print("Minor")
Minor

Note that indentation is really important!

age = 15

if age >= 18:
    print("Adult")
else:
print("Minor")
expected an indented block after 'else' statement on line 5 (<string>, line 6)

if...elif...else statement

When you need to evaluate more than just two alternative conditions, you can use sort of nested conditional statements with with if...elif...else

age = 10

if age >= 18:
    print("Adult")
elif age >= 13:
    print("Adolescent")
elif age >= 2:
    print("Child")
else:
    print("Infant")
Child

Example: Preplanned Analysis

Example of automated decision in a hypothetical pre-registered analysis pipeline:

import numpy as np
import scipy.stats as st

x1 = np.random.normal(0, 1, size=30)
x2 = np.random.normal(0.5, 1, size=30)

tt = st.ttest_rel(x1, x2)
print(tt.pvalue.round(4))
0.8161
if tt.pvalue < 0.05:
    print("Significant result: proceeding with follow-up analysis")
    # Here you could perform other analyses after the preliminary check
else:
    print("No significant result: reporting preliminary test only")
No significant result: reporting preliminary test only

Vectorized and Nested conditions

All previous examples evaluated a single statement that may be True or False. However, you often want to apply this operation to an entire vector

agesVector = np.array([2, 28, 15, 11, 4, 67, 0, 42, 14, 8])

if agesVector >= 18:
    print("Adult")
else:
    print("Minor")
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

the error message suggests that I might use np.any(agesVector >= 18) or np.all(agesVector >= 18), but this is not what I want! What I want is actually an if...else that evaluates across a whole vector of Trues and Falses (which should be like the ifelse() in R)

Vectorized and Nested conditions with np.where() and np.select()

agesVector = np.array([2, 28, 15, 11, 4, 67, 0, 42, 14, 8])

np.where(agesVector >= 18, "Adult", "Minor")
array(['Minor', 'Adult', 'Minor', 'Minor', 'Minor', 'Adult', 'Minor',
       'Adult', 'Minor', 'Minor'], dtype='<U5')

manages one single condition, similar to ifelse() in R

conditions = [agesVector >= 18, agesVector >= 13, agesVector >= 2]
choices = ["Adult", "Adolescent", "Child"]

np.select(conditions, choices, default="Infant")
array(['Child', 'Adult', 'Adolescent', 'Child', 'Child', 'Adult',
       'Infant', 'Adult', 'Adolescent', 'Child'], dtype='<U10')

manages multiple nested conditions; no direct equivalent in R, maybe dplyr::case_when()

Loops in Python

Looping in Python is used to repeat actions. for and while are most common

for loop basics

for i in range(5):
    print(i)
0
1
2
3
4
for i in range(5):
    print(i**2)
0
1
4
9
16

Time-based iteration

import time

for i in range(5):
    print(time.time())
    time.sleep(1)
1746265534.0207891
1746265535.0219188
1746265536.0230012
1746265537.0247726
1746265538.0253975

Monte Carlo Simulation 😃

Repeat a data simulation to estimate the standard error of the mean:

import numpy as np

N = 30
niter = 10
np.random.seed(0) # set seed for reproducibility: best practice! 
results = np.empty(niter) # initialize empty vector: best practice!

for i in range(niter):
    x = np.random.normal(size=N)
    results[i] = x.mean()

print(results.round(4))
[ 0.4429 -0.2895 -0.1337  0.5108  0.0965 -0.0672 -0.1006 -0.0776 -0.304
  0.1978]
print(np.std(results).round(4))
0.267

Monte Carlo Simulation 😃

# STEP 1: RUN SIMULATION

import numpy as np

N = 30
niter = 10000

np.random.seed(0)
results = np.empty(niter) 

for i in range(niter):
    x = np.random.normal(size=N)
    results[i] = x.mean()

# STEP 2: ESTIMATE STANDARD ERROR

print(np.std(results).round(4))
0.1814
# STEP 3: PLOT RESULTS

import matplotlib.pyplot as plt

plt.hist(results, bins=50)

plt.xticks(fontsize=16);
plt.xlabel("mean",fontsize=16)

plt.show()

Iterating over Elements

Iterating over a sequence of integers (e.g., “i in range(niter)” is a common practice, however you could also iterate directly over the elements of a List or other data structures

words = ["this", "is", "a", "vector", "of", "strings"]

for w in words:
    print(w.upper()*2)
THISTHIS
ISIS
AA
VECTORVECTOR
OFOF
STRINGSSTRINGS

List comprehension is another, compact type of for loop over list elements:

[w.upper()*2 for w in words]
['THISTHIS', 'ISIS', 'AA', 'VECTORVECTOR', 'OFOF', 'STRINGSSTRINGS']

while loop

The while loop is another classical type of iterative structure. It is useful when the precise number of iterations is unknown a priori, and depends on a condition becoming True

amount = 1000
month = 0
interest_rate = 0.001

while amount < 1500:
    month += 1
    amount += amount * interest_rate

print(month)
406
Interpretation: it takes 406 months to reach an amount of €\(1,500\) when starting with an amount of €\(1,000\) with a 0.1% monthly interest rate

break in loops

The break command allows to interrupt any loop based on a condition

import time
import scipy.stats as st

i = 0
pval = 1
Start = time.time()
while pval >= 0.001: # go on until p < 0.001
  i += 1
  x1 = np.random.normal(0,1,size=30)
  x2 = np.random.normal(0,1,size=30)
  tt = st.ttest_rel(x1, x2)
  pval = tt.pvalue
  Now = time.time()
  if Now - Start > 10: 
    break            # however, stop if overall time exceeds 10 seconds
print([i, pval.round(4)])
[525, np.float64(0.0008)]

Other iteration: for with zip()

zip() pairs elements across multiple sequences while iterating them

teacher = ["Pastore", "Granziol", "Feraco","Altoe"]
course = ["CurrentIssues", "BasicsInference", "SEM","Outliers"]
hours = [10, 20, 20, 5]

for t, c, h in zip(teacher, course, hours):
    print(f"{t} teaches {c}, which has {h} hours")
Pastore teaches CurrentIssues, which has 10 hours
Granziol teaches BasicsInference, which has 20 hours
Feraco teaches SEM, which has 20 hours
Altoe teaches Outliers, which has 5 hours
base = [5, 10, 10, 2, 7, 15]
exponent = [2, 1, 2, 5, 5, 2]

result = [b**e for b, e in zip(base, exponent)] ; print(result)
[25, 10, 100, 32, 16807, 225]
But note that the latter could be obtained much more easily with numpy vectorized operations: np.array(base) ** np.array(exponent)

Other iteration: map()

map() applies a specific function to each item in a sequence:

result = map(len, ["apple", "banana", "pear", "watermelon"]) 

result = list(result)
print(result)
[5, 6, 4, 10]

in map(), you need to use list(...) to actually generate the result, otherwise a non-evaluated “lazy” map object is obtained

zip() and map() are about equivalent to lapply()/sapply() in R

Custom Functions

Custom functions are widely used in Python for efficiently reusing chunks of code. Define your functions with def; the logic is very similar as in R:

a = [10, 14, 7.6, 18, 22, 50, 0.5]
b = [700, 131, 215, 133.2, 190, 4100, 108.9]
c = [-4.2, -10.2, 2, -15]

def zScore(vect):
    vect = np.array(vect)
    mu = np.mean(vect)
    sigma = np.std(vect)
    return (vect - mu) / sigma

zScore(a).round(3)
array([-0.503, -0.233, -0.665,  0.038,  0.308,  2.201, -1.145])
zScore(b).round(3)
array([-0.071, -0.489, -0.427, -0.487, -0.446,  2.425, -0.505])
zScore(c).round(3)
array([ 0.415, -0.525,  1.386, -1.277])

Custom Functions with def

Let’s elaborate the custom zScore function a little bit, adding another arguments that allows us to specify whether we want to ignore missing values:

myVector = np.array([10, 14, 7.6, np.nan, 18, 22, 50, 0.5, np.nan, 1.4, 7])

def zScore(vect, naIgnore=True):
    vect = np.array(vect)
    if naIgnore: 
        vect = vect[~np.isnan(vect)]
    mu = np.mean(vect)
    sigma = np.std(vect)
    return (vect - mu) / sigma

zScore(myVector, naIgnore=False).round(2)
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
zScore(myVector, naIgnore=True).round(2)
array([-0.32, -0.04, -0.49,  0.25,  0.53,  2.5 , -0.98, -0.92, -0.53])
The ~ operator is equivalent to not, but elementwise

Supercompact Custom Functions with lambda

lambda command allows you to define a function in a single line of code without def or return; it may be useful for quick transformation, but of course does not allow any complex “logic” / statement

myVector = np.array([10, 14, 7.6, 18, 22, 50, 0.5, 1.4, 7])

zScore = lambda x: (x - x.mean()) / x.std()

zScore(myVector)
array([-0.31638231, -0.03515359, -0.48511954,  0.24607513,  0.52730384,
        2.49590486, -0.98430051, -0.92102405, -0.52730384])

Conditional programming: Python vs R

Task Python R
basic if if cond: if(cond){ }
if … else if cond:
else:
if(cond){
} else { }
Multiple conditions if cond1:
elif cond2:
else:
if(cond1){
} else if(cond2){
} else { }
Block delimiter indentation { }
not” elementwise ~cond !cond
Multiple checks (a > 1) & (b < 5) (a > 1) & (b < 5)
Vectorized condition np.where(conds, ifT, ifF) ifelse(conds, ifT, ifF)
Multiple/nested vectorized conditions np.select([...], [...]) dplyr::case_when()

Loops and Functions: Python vs R

Task Python R
Loop over integers for i in range(n): for(i in 1:n){ }
Loop over elements for a in A: for(a in A){ }
While loop while cond: while(cond){ }
Block delimiter indentation { }
Break loop break break
Apply function (list) list(map(func, A)) lapply(A, func)
Multilist iteration for a, b in zip(A, B): mapply(FUN, A, B)
List comprehension [func(a) for a in A] lapply(...)
Function def myFunc(a):
_____...
_____return ...
myFunc = function(a){ ... return(...)
}
Supercompact function lambda a: a + 1 function(a) a + 1